You can download SPSS from the KCL Software Hub. You will need to log in with your KCL credentials to access the software.
Search for “IBM SPSS Statistics”, select your platform (Windows / Mac), and add to cart:
Next check out, adding your details, which will allow you to download.
Once you have downloaded the software, you will need to install SPSS on your computer.
The aim of this section is to familiarise ourselves with the SPSS environment. The SPSS data editor contains two views which you can switch between using the tabs at the bottom left of the screen (figure 1).
Figure 1
Somewhat confusingly, SPSS sometimes refers to p-value as ‘sig’. What p-value really signifies is the probability that the results have occurred by chance, given the null hypothesis. If the p (probability) is low, then we say the result is significant. In Education and many social sciences, the ‘cut off’ point is often 0.05. And we refer to this cut-off as ‘alpha’.
Table 1. Explanation of columns in SPSS variable view
| Column title | What it means |
|---|---|
| Name | This column provides the name of the variable. Older versions of SPSS were limited to 8 character names, which is why you often find rather intriguing names for variables in data sets. New versions of SPSS are not limited to 8 characters, but lengthy descriptions should not be included in the Name. They go in the Label column |
| Type | This column indicates the type of variable that is reflected in this particular row. There are 8 options to choose from: Numeric, Comma, Dot, Scientific notation, Date, Dollar, Custom currency, and String.Most variables beginning users will encounter are either Numeric or String variables. Numeric variables are numbers that either represent a value (e.g., 1=Catholic) or are the value of interest (height=73 inches). String numbers are text and can only be treated as such. As a result, very few manipulations can be performed on them in SPSS. |
| Width | The number of digits displayed for numerical values or the length of a string variable. |
| Decimals | This column allows you to control the number of characters after the decimal place. |
| Label | This column allows you to provide a more extensive description of the variable. |
| Values | This column allows you to provide a key for what the numbers of a numeric variable may represent (e.g., 1 = Catholic, 2 = Protestant). |
| Missing | This column allows you to indicate whether there are any missing values in a variable. Values marked as missing are excluded from analyses in SPSS. |
| Columns | The width of each column in the Data View spreadsheet. Note that this is not the same as the number of digits displayed for each value. This simply refers to the width of the actual column in the spreadsheet. |
| Align | This column indicates the alignment of the variable in the Data View |
| Measure | This column indicates the level of measurement of the variable. There are three from which you can choose: Nominal, Ordinal, and Scale. |
| Role | The role that a variable will play in your analyses (i.e.,independent variable, dependent variable, both independent and dependent). Some options in SPSS allow you to pre-select variables for particular analyses based on their defined roles. Any variable that meets the role requirements will be available for use in such analyses. It is not recommended that you tamper with this, at least not as novices |
Source one and Source two
Click on ‘file’, then ‘new’, and then ‘data’ to open a blank data editor.
Give your variables the following characteristics (in the variable view).
| Name | Type | Width | Decimals | Label | Values | Missing | Columns | Align | Measure |
|---|---|---|---|---|---|---|---|---|---|
| IDnumber | Numeric | 8 | 0 | Participant ID | None | None | 8 | Right | Scale |
| Gender | Numeric | 8 | 0 | Participant Gender | 1 = Male, 2 = Female | None | 8 | Right | Nominal |
| IQ | Numeric | 8 | 0 | IQ Score | None | None | 8 | Right | Scale |
Note: type, width, columns, align can often be left as the default. Also, changing the decimal value will not alter the information you input if you only input whole numbers.
Try entering the following data into SPSS (in the data view).
| IDnumber | Gender | IQ |
|---|---|---|
| 1 | Male | 105 |
| 2 | Female | 110 |
| 3 | Female | 112 |
| 4 | Female | 102 |
| 5 | Male | 100 |
| 6 | Male | 120 |
| 7 | Female | 98 |
| 8 | Male | 103 |
| 9 | Female | 128 |
| 10 | Male | 110 |
Remember that 1 = Male, 2 = Female for Gender
Finally, locate the Output Window, which is empty at the moment.
** You may want to save your data file that is generated to in a space you can find (e.g., on your desktop for now) using the file name section1.sav**
The first aim of this task is to obtain descriptive statistics such as means, standard deviations, frequencies and range of various variables. Download the data file PISA_2022_london.sav and open it in SPSS.
The data are from the OECD’s PISA 2022 survey – a survey of nearly 700,000 fifteen-year-old students that examines their performance in maths, science and reading and collects other data about their schooling. The full data set has been cut down to make things easier – we have included only some of the variables and only included the data for students in London (777 students). You will find the following variables in the data:
| Item | Description |
|---|---|
| ST004D01T | Gender: Male / Female / NA. |
| PV1MATH | Mathematics test scores (0-1000) |
| PV1READ | Reading test scores (0-1000) |
| PV1SCIE | Science test scores (0-1000) |
| HOMEPOS | A measure of wealth (home possessions) Normalised with a mean of 0, |
| ESCS | A measure of social class Normalised with a mean of 0, |
| OCOD1 | Mother’s occupation |
| OCOD2 | Father’s occupation |
| ST253Q01JA | How many digital devices in your home? |
| ST016Q01NA | How satisfied are you with life (/10) |
| ST253Q01JA | How many digital devices in your home? |
| IC180Q01JA | Agree/disagree: I trust what I read online |
| PA185Q08JA | Agree/disagree: At home, we discuss the books we are reading |
| life_sat | Satisfied / Not-satisfied |
777 students
Categorical (it is a gender variable)
Ordinal - it is a count of digital devices in the home (the responses are: There are no devices, one, two, three etc)
There are a number of ways of obtaining information describing data using SPSS, in this exercise you will use the Descriptives option in the Analyze ➝ Descriptive statistics menu (see Figure 2.1). The Descriptives option should be used to generate descriptive information about continuous variables using all cases in the data file. The Frequencies option should be used to generate descriptive information about categorical variables using all cases in the data file.
Figure 2.1
Assume we are required to produce descriptive information such as the mean math score by gender for all students in the sample. To do this, first select the Analyze pull down menu. Choose the Descriptive statistics option. You are presented with a further menu where you should click on Descriptives.
You are then presented with a window showing a list of variables in the PISA_2022.sav data file (see figure 2.2). First select the variables you want to look at. Here select both AGE and TENURE (by clicking on the variable in the list and then clicking the arrow button). Now you have to choose what type of statistics you wish to generate. To do this click on the grey Options button.
You can now choose various descriptive statistics which describe the nature of your data. To select an option, click on the word and a cross will appear in its box indicating that it is selected. Select these statistics: Mean, Standard Deviation, Minimum, Maximum, and Range. Once these are selected click on the Continue button. Then click OK.
Figure 2.2
You are then presented with the output screen presented in figure 2.3, the Descriptives requested will have been generated in this window. You can look at your output by moving about in the window using the arrow keys on the keyboard or the Page up/ Page down buttons. You can also use the mouse and the “Scroll Bar” on the right hand edge of the output window to move around.
** You may want to save your output that is generated to in the same space you saved the previous work using the filename section2.spv, but do not close this window once you have saved it**
Figure 2.3
Task 2: More descriptive statistics
Mean = 505.16938 Standard deviation = 101.644069
Reading (875.468)
From 1 to 8
Digital Devices; N=639, HOMEPOS, N=646
Assume we are required to produce other descriptive information such as the number and percentage of students who trust information online (IC180Q01JA), the percentage of boys and girls (ST004D01T) and the percentage with different numbers of digital devices (ST253Q01JA). To do this, again select the Analyze pull down menu. Choose the Descriptive statistics option, but this time click on Frequencies in the further menu.
You are again presented with a window showing a list of variables in the PISA_2022_london.sav data file (see figure 2.4). First select the variables you want to look at. Here select ST004D01T (Gender), IC180Q01JA (Trust in the internet), and , (by clicking on the variable in the list and then clicking the arrow button). SPSS will produce counts and percentages by default for this analysis.
Figure 2.4
Click on OK and you will go straight to the output window where your information is generated. You may have some information in this window from the previous task so be aware of this. The new output provides frequency counts of the data for each of the variables you selected and also a series of percentages (see Fig 2.5). Note that for this analysis Percent and Valid Percent are the same as there are no missing data for these variables.
Figure 2.5
Save your output again (overwrite the previous section2.spv as long as your output file includes the data generated for both task 2 and 3).
Task 3 Frequencies
44.9% Female; 55.1% Male
2.1% strongly agree
The researcher who collected the data was then interested in whether there were any associations between some of the variables studied in the survey. In particular, the researcher was interested in examining the associations between two of the categorical variables in the questionnaire. In order to do this Chi-square tests were required.
The hypothesis that the researcher wanted to test:
H1: Life satisfaction is associated with gender
To carry out a Chi-square analysis click on Descriptive statistics in the Analyze pull down menu. Then select Crosstabs. You will see the variables in the file teachers1.sav listed in the Crosstabs window on the left (See Fig 3.1 and 3.2). Put the dependent variable (DV) into the Row(s): box and the independent variable (IV) into the Column(s): box (remember it is the IV that affects the DV, not the other way round). Once you have selected the two variables, click on the Statistics button (on the side of the window). Fig 3.3 will appear and here select the Chi Square option. Then click on Continue. Click on the Cells button and Fig 3.4 will appear. In the Counts sector click on the Expected selection, the Observed selection should already be selected. This ensures that the expected and the observed frequencies are generated in each cell of the Chi square contingency table. Also in this window, select the Percentages: Column option. Click on Continue then OK to generate your output.
Figure 3.1, 3.2, 3.3, 3.4
In hypothesis testing we use the p-value to decide whether to accept or reject the hypothesis. If the p-value is greater than 0.05 (written as p>0.05) we reject the hypothesis (and assume that girls and boys have the same life satisfaction). If the p-value is less than 0.05 (written as p<0.05) we accept the hypothesis. The p-value for the chi-square test is highlighted below.
Q10) Describing what you have found - Use the data in your output window to complete the statement below:
___________ percent of boys who were satisfied with their life and
_________ percent were dissatisfied.
____________ percent of girls who were satisfied with their life and
_____________ percent were dissatisfied. Compared to boys, girls were
more likely to be satisfied/dissatisfied (delete as appropriate) with
their lives.
This association was/was not (delete as appropriate) statistically
significant (p<0.05/p>0.05) (delete as appropriate).
Q11) (Advanced) What is the null hypothesis for the test that you have conducted?
Q12) (Advanced) Using the data in your output window, report the results of the chi square below. Report the results like this: 𝝌2 (degrees of freedom) = Pearson Chi Square value , p value
𝝌2 (______) = __________ , ___________
Save your output as section3.spv.
70.2 percent of boys who were satisfied with
their life and 29.8 percent were
dissatisfied.
56.2 percent of girls who were satisfied with
their life and 43.8 percent were dissatisfied.
Compared to boys, girls were more likely to be
dissatisfied with their lives.
This association was statistically significant
(p<0.05).
Q10) (Advanced) What is the null hypothesis for the test that you have conducted?
Boys and girls report the same level of life satisfaction
Q11) (Advanced) Using the data in your output window, report the results of the chi square below. Report the results like this: 𝝌2 (degrees of freedom) = Pearson Chi Square value , p value
𝝌2 (1) = 13.193 , p <0.01
There are two types of t-test that look at the difference between 2 groups or conditions. These are Paired t-tests (within/related subjects) and Independent samples t-test (between/unrelated subjects). We are going to look at both types using the stress.sav data file. Using this data file we can perform both types of t-test.
You will find five different variables in the Wellbeing.sav data file. The data comes from a study into the wellbeing of university students in Colombia which measured their life satisfaction with a survey before and after a course on wellbeing. The data also include a group of control students who didn’t take the course.
The variables in the data set are:
| Variable | Description |
|---|---|
| ID | A participant unique identifier |
| Gender | The Gender of the student (1 = Male / 2 = Female) |
| Group | Whether the student was in the control or condition group (Intervention/ Control) |
| Life_sat_pre | Respondents reports of their life satisfaction (/10) before the intervention |
| Life_sat_post | Respondents reports of their life satisfaction (/10) after the intervention |
Here we are going to test whether or not there is a statistically significant difference between the overall stress levels at time one and time two.
Choose the Compare Means & Proportions option in the Analyze pull down menu. Then select the appropriate type of t-test (Paired-Samples T Test). You will be presented with a window (See fig 4.1).
Figure 4.1
Click on the two repeated measures variables (a repeated measures variable is one measured more than once, i.e. it is repeated) and transfer them into the Paired Variables: box. Then simply click on OK. There you have it. Your t-test will appear in the output window.
Yes, the mean of the pre-test is 7.0549 and the post test 7.7134
The p-value is <0.01
That the means of the pre- and post-test groups are equal
Yes the p-value is <0.01 so the null hypothesis can be rejected. Fill in: t(327) = -7.973, <0.01
Independent-samples t-test
We are now going to test whether there is a difference in stress levels between males and females at time 1 (an independent t-test) and then at time 2. To run the t-test, first select the Compare Means & Proportions option in the Analyze pull down menu. Then select the relevant t-test. Once the Independent-samples t-test window is presented, select the dependent variables (into the Test Variable(s): box). Then select the variable that defines the different groups we wish to compare (Gender) into the Grouping Variable: box. You then have to tell SPSS what groups within the grouping variable you wish to compare. Here, in Gender, there are only two different groups (male and female). As such, click on the Define Groups button and enter 1 (male) in the Group 1: box and 2 (female) in the Group 2 (if not automatically entered): box.
Figure 4.2
Now click continue. Then click on OK and your tests will run.
Interpreting the outputs
The output table shows the results from both t-tests: one comparing the stress levels of males and females at Time 1 (on top) and one comparing the stress levels of males and females at Time 2. Use the p values (in the column labelled ‘Sig. (2-tailed)’ and the row ‘Equal variances not assumed’) that are on the bottom in each box to answer the questions below. This is a more cautious test.
On the pre-test Males (“1”) score higher (mean = 7.4565) than females (“2”) (mean = 6.7632). On the post-test Males (“1”) also score higher (mean = 7.8986) than females (“2”) (mean = 7.5789).
On the pre-test the p-value is <0.01 - the result is significant.
On the post-test the p-value for a two-tailed test with equal variance not assumed is 0.45 - the result is not significant.
The means of the wellbeing scores for males and females are the same
On the pre-test the p-value is <0.01 - the result is significant.
On the post-test the p-value for a two-tailed test with equal variance not assumed is 0.45 - the result is not significant.
Male students had higher wellbeing that female students on the pretest (mean = 7.4565 vs 6.7632) - this difference was statistically significant (t(326) = 4.194, p<.001). However, this difference was not significant (t(326) = 2.016, p<.045) on the post-test (mean = 7.8986 vs 7.5789).
One of the assumptions of an independent t-test is that we have homogeneous (similar) variances in both groups. If we violate this assumption, the results of our t-test may be invalid. Therefore, we must test whether this assumption has been violated before interpreting our t-test. This is done with the Levene’s Test.
A significant result (i.e. p<0.05) for our Levene’s test means we have violated the assumption. In which case, we must use an ‘adjusted’ t-test. This is given on the row labelled “equal variances not assumed” in the SPSS output.
However, if we have not violated the homogeneity of variances assumption (i.e. Levene’s is p>0.05) we report the t-test results on the row labelled “equal variances assumed”.
Access the data file: PISA_2018_enviro_UK.sav
The file contains responses to five items on the OECD’s PISA 2018 survey which samples the views of 15-year-old students in 80 countries countries – the data have been filtered to included only students from 10 randomly selected schools in the UK (N = 226). The survey includes items on the wealth and social class of respondents, and five items about pro-environmental behaviours:
| Variable | Description |
|---|---|
| ST222Q01HA | I reduce the energy I use at home […] to protect the environment. |
| ST222Q03HA | I choose certain products for ethical or environmental reasons, even if they are a bit more expensive. |
| ST222Q04HA | Involved in: I sign environmental or social petitions online |
| ST222Q06HA | I boycott products or companies for political, ethical or environmental reasons |
| ST222Q09HA | I participate in activities in favour of environmental protection |
The questions are yes or no response which are scored as Yes = 1, No = 0. The students’ responses to all the questions are summed to give a total pro-environmental behaviour score. In the PISA_2018_enviro_UK.sav the following variables are available:
| Variable | Description |
|---|---|
| ST004D01T | Gender: Male / Female / NA |
| PV1MATH | Math score |
| PV1SCIE | Science Score |
| PV1READ | Reading Score |
| HOMEPOS | Measure of wealth – normalised to a mean of 0 and a standard deviation of 1 |
| ESCS. | Measure of social class – normalised to a mean of 0 and a standard deviation of 1 |
| Enviro | Score (/5) for pro-environmental behaviours |
We are interested in determining if there are correlations between students’ wealth and social class, and their pro-environmental behaviours. A correlation is a measure of the extent to which two variables change in the same way.
To carry out a bivariate correlation, click on the Analyze pull down menu and then click on the Correlate option. Then select Bivariate. The window presented in Figure 5.1 will appear. On the left hand side of the window you will see the list of the variables in the data file. Select the variables and click on the OK button.
For example, to determine the correlation between wealth (HOMEPOS) and the environmental score (Enviro), use the blue arrow to add the two variables to the right hand variable box.
Figure 5.1
Note that a minus sign in the correlation coefficient indicates a negative correlation – that means as one variable increases, the other decrease.
ρ = .0.682 (strong positive correlation) p <.001
ρ = .0.125 (moderate positive correlation) p = .60 (not significant)
ρ = .0.72 (strong positive correlation) p = .283 (not significant)
ρ = .083 (very weak positive correlation) p = .216 (not significant)
To interpret correlations, the following guidelines are often used:
| Strength | Correlation |
|---|---|
| Very weak | 0.00-0.19 |
| Weak | 0.20-0.39 |
| Moderate | 0.40-0.59 |
| Strong | 0.60-0.79 |
| Very strong | 0.80-1.00 |
• Kent State University has a comprehensive guide to SPSS
• SPSS Tutorials Andy Field’s vidoe tutorials in SPSS can be helpful
• For a comprehensive guide Andy Field’s Discovering statistics using SPSS is a good place to start and available on the library site